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Abstract 

A number of computations exist, especially in area of error-control coding 
and matrix computations, whose underlying data flow graphs are based on finite 
c/3 ', projective-geometry based balanced bipartite graphs. Many of these applica- 

tions of projective geometry are actively being researched upon, especially in the 
area of coding theory. Almost all these applications need bipartite graphs of the 
order of tens of thousands in practice, whose nodes represent parallel computa- 

00 I tions. To reduce its implementation cost, reducing amount of system/hardware 

resources during design is an important engineering objective. In this context, we 
present a scheme to reduce resource utilization when performing computations 
derived from projective geometry (PG) based graphs. In a fully parallel design 

(3 ■ based on PG concepts, the number of processing units is equal to the number 

of vertices, each performing an atomic computation. To reduce the number of 
processing units used for implementation, we present an easy way of partitioning 
the vertex set assigned to various atomic computations, into blocks. Each block 

^ ■ of partition is then assigned to a processing unit. A processing unit performs 

the computations corresponding to the vertices in the block assigned to it in a 
sequential fashion, thus creating the effect of folding the overall computation. 
These blocks belong to certain subspaces of the projective space, thus inheriting 
symmetric properties that enable us to develop a conflict-free schedule. More- 
over, the partition is constructed using simple coset decomposition. The folding 
scheme achieves the best possible throughput, in lack of any overhead of shuf- 
fling data across memories while scheduling another computation on the same 
processing unit. As such, we have developed multiple new folding schemes for 
such graphs. This paper reports two folding schemes, which are based on same 
lattice embedding approach, based on partitioning. We first provide a scheme, 
based on lattice embedding, for a projective space of dimension five, and the 
corresponding schedules. Both the folding schemes that we present have been 
verified by both simulation and hardware prototyping for different applications. 
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For example, a semi-parallel decoder architecture for a new class of expander 
codes was designed and implemented using this scheme, with potential deploy- 
ment in CD-ROM/DVD-R drives. We later generalize this scheme to arbitrary 
projective spaces. 

Keywords: Projective Geometry, Parallel Scheduling and Semi-parallel Architecture 

1 Introduction 

A number of naturally parallel computations make use of balanced bipartite graphs 
arising from finite projective geometry [5], [1], [10], [1], and related structures |8], [9], [7] 
to represent their data flows. Many of them are in fact, recent research directions, e.g. 
0; 0) |1]- These bipartite graphs are generally based on point-hyperplane incidence 
relationships of a certain projective space. As the dimension of the projective space is 
increased, the corresponding graphs grow both in size and order. Each vertex of the 
graph represents a processing unit, and all the vertices on one side of the graph can 
compute in parallel, since there are no data dependencies/edges between vertices that 
belong to one side of a bipartite graph. The number of such parallel processing units 
is generally of the order of tens of thousands in practice for various reasons. 
It is well-known in the area of error-control coding that higher the length of error 
correction code, the closer it operates to Shannon limit of capacity of a transmission 
channel |1]. The length of a code corresponds to size of a particular bipartite graph. 
Tanner graph, which is also the data flow graph for the decoding system [T^ . Similarly, 
in matrix computations, especially LU/Cholesky decomposition for solving system of 
linear equations, and iterative PDE solving (and the sparse matrix vector multiplication 
sub-problem within) using conjugate gradient algorithm, the matrix sizes involved 
can be of similar high order. A PG-based parallel data distribution can be imposed 
using suitable interconnection of processors to provide optimal computation time [10] , 
which can result in quite big setup(as big as a petaflop supercomputer). This setup 
is being targeted in Computational Research Labs, India, who are our collaboration 
partners. Further, at times, scaling up the dimension of projective geometry used 
in a computation has been found to improve application performance [1]. In such a 
case, the number of processing units grows exponentially with the dimension again. For 
practical system implementations with good application performance, it is not possible 
to have a large number of processing units running in parallel, since that incurs high 
manufacturing costs. We have therefore focused on designing semi-parallel, or folded 
architectures, for such applications. In this paper, we present a scheme for folding 
PG-based computations efficiently, which allows a practical implementation with the 
following advantages. 

1. The number of on-chip processing units reduces. Further, the scheduling of 



computations is such that no processing unit is left idle during a computation 
cycle. 

2. Each processing unit can communicate with memories associated with the other 
units using a conflict-free memory access schedule. That is, a schedule can be 
generated which ensures that there are no memory access conflicts between pro- 
cessing units. 

3. Data distribution among the memories is such that the address generation circuits 
are simplifled to counters/look-up tables. Moreover, the distribution ensures that 
during the entire computation cycle, a word (smallest unit of data read from a 
memory) is read from and written to the same location in the same memory that 
it is assigned to. 

The last advantage is important because it ensures that the input and write-back phases 
of the processing unit is exactly the same as far the memory accesses are concerned. 
Thus the address generation circuits for both the phases are identical. Also, the original 
computation being inherently parallel, we can overlap the input and write-back phases 
by using simple dual port memories. The core of aforementioned scheme is based on 
adapting the method of vector space partitioning [2| to projective spaces, and hence 
involves fair amount of mathematical rigor. 

A restricted scheme of partitioning a PG-based bipartite graph, which solves the same 
problem, was worked out earlier using different methods [3]. An engineering-orinented 
dual scheme of partitioning has also been worked out. It specifies a complete synthesis- 
oriented design methodology for folded architecture design P^ . All this work was done 
as part of a research theme of evolving optim,al folding architecture design methods, and 
also applying such methods in real system design. As part of second goal, such folding 
schemes have been used for design of specific decoder systems having applications in 
secondary storage [11], [1]. 

In this paper, we begin by giving a brief introduction to Projective Spaces in section 
[21 A reader familiar with Projective Spaces may skip this section. It is followed 
by a model of the nature of computations covered, and how they can be mapped 
to PG based graphs, in section [31 Section [H introduces the concept of folding for 
this model of computation. We then present two folding schemes, based on lattice 
embedding techniques, and the corresponding schedules for graphs derived from point- 
hyperplane incidence relations of a projective space of dimension five, in section [SI We 
then generalize these results for graphs derived from arbitrary projective geometry, in 
section [61 We provide specifications of some real applications that were built using 
these schemes, in the results section(section[7]). 



2 Projective Spaces 

2.1 Projective Spaces as Finite Field Extension 

We first provide an overview of liow tlie projective spaces are generated from finite 
fields. Projective spaces and tlieir lattices are built using vector subspaces of the 
bijectively corresponding vector space, one dimension high, and their subsumption 
relations. Vector spaces being extension fields, Galois fields are used to practically 
construct projective spaces [T]. 

Consider a finite field F = GF(s) with s elements, where s = p'^, p being a prime 
number and k being a positive integer. A projective space of dimension d is denoted 
by P(d, F) and consists of one- dimensional vector subspaces of the (d + l)-dimensional 
vector space over F (an extension field over F), denoted by F'^+^. Elements of this 
vector space are denoted by the sequence (xi, . . . , Xd+i), where each Xi G F. The total 
number of such elements are s^'^"'""'^-' = p''('^+^). An equivalence relation between these 
elements is defined as follows. Two non-zero elements x, y are equivalent if there exists 
an element A G GF(s) such that x = Ay. Clearly, each equivalence class consists of s 
elements of the field ((s — 1) non-zero elements and 0), and forms a one-dimensional 
vector subspace. Such 1-dimensional vector subspace corresponds to a point in the 
projective space. Points are the zero-dimensional subspaces of the projective space. 
Therefore, the total number of points in F{d, F) are 

P{d) = (1) 

s — 1 

An m-dimensional projective subspace of P(t/, F) consists of all the one- dimensional 
vector subspaces contained in an (m + l)-dimensional subspace of the vector space. 
The basis of this vector subspace will have (m + 1) linearly independent elements, 
say bo, ... , bm- Every element of this vector subspace can be represented as a linear 
combination of these basis vectors. 



y^^ajbi, 



where a^ G F(s) (2) 



i=0 



Clearly, the number of elements in the vector subspace are s*^™"'"-'^^ The number of 
points contained in the m-dimensional projective subspace is given by P(m) defined 
in equation (Q. This (m + l)-dimensional vector subspace and the corresponding 
projective subspace are said to have a co-dimension of r = (d — m) (the rank of 
the null space of this vector subspace). Various properties such as degree etc. of a m- 
dimensional projective subspace remain same, when this subspace is bijectively mapped 
to (d — m — l)-dimensional projective subspace, and vice-versa. This is known as the 
duality principle of projective spaces. 



An example Finite Field and the corresponding Projective Geometry can be generated 
as follows. For a particular value of s in GF(s), one needs to first find a primitive 
polynomial for the field. Such polynomials are well-tabulated in various literature. For 
example, for the (smallest) projective geometry, GF(2^) is used for generation. One 
primitive polynomial for this Finite Field is (x^ + x + 1). Powers of the root of this 
polynomial, x, are then successively taken, (2'^ — 1) times, modulo this polynomial, 
modulo-2. This means, x^ is substituted with (x + 1), wherever required, since over 
base field GF(2), -1 = 1. A sequence of such evaluations lead to generation of the 
sequence of (s — 1) Finite field elements, other than 0. Thus, the sequence of 2^ 
elements for GF(2^) is 0(by default), q;° = l,a^ = a^o? = a'^,a^ = a + l,a'^ = 
oP' + a, a" 



„2 , ^, ^,5 _ .,2 , ., , 1 «6 ^ ^2 ^ i_ 



a 



a 





1 3 

(a) Line-point Associ- 
ation 



p5 p6 pO pi p2 p3 

(b) Bipartite Representation 



p4 



Figure 1: 2-dimensional Projective Geometry 

To generate Projective Geometry corresponding to above Galois Field example(GF(2^)), 
the 2-dimensional projective plane, we treat each of the above non-zero element, the 
lone non-zero element of various 1-dimensional vector subspaces, as points of the ge- 
ometry. Further, we pick various subfields (vector subspaces) of GF(2^), and label them 
as various lines . Thus, the seven lines of the projective plane are {1, a, a^ = 1 -|- a}, 
{1, a^, a^ = 1 -|- a^}, {a, a^, 0^ = 0?^ a}, {\-,cx^ = a^ -|- a, a^ = a^ -|- a -|- 1}, {a, oc' 



a^ -h a + 1, a 



a' 



-h 1}, {o?-, a^ = a -h 1, «^ = a^ + a + 1} and {a^ = 1 -h a, a^ 



= a -|- o? ^ a^ = 1 -|- a^}. The corresponding geometry can be seen as figures [H 
Let us denote the collection of all the 1-dimensional projective subspaces by fii. Now, 
fio represents the set of all the points of the projective space, fii is the set of all lines, 
f22 is the set of all planes and so on. To count the number of elements in each of these 
sets, we define the function 



^n-l-l 



(n,Ls) 



-l)(s"- 1) 



^n-l+l 



-V. 



(s-l)(s2-l)...(s'+i-l) 



(3) 



Now, the number of m- dimensional projective subspaces of P((i, F) is (j){d,m,s). For 
example, the number of points contained in ¥{d,F) is (j){d,0,s). Also, the number of 
1-dimensional projective subspaces contained in an m-dimensional projective subspace 
(where < I < m < d) is (l){m,l,s), while the number of m-dimensional projective 
subspaces containing a particular 1-dimensional projective subspace is (j){d — l — l,m — 
l-hs). 
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Figure 2: A Lattice Representation for 2-dimensional Projective Space 



2.2 Projective Spaces as Lattices 

It is a well-known fact that the lattice of subspaces in any projective space is a mod- 
ular, geometric lattice [13]. A projective space of dimension 2 is shown in figure 
O In the figure, the top-most node represents the supremum, which is a projective 
space of dimension m in a lattice for P(m, GF(q)). The bottom- most node represents 
the infimum, which is a projective space of (notational) dimension -1. Each node in 
the lattice as such is a projective subspace, called a flat. Each horizontal level of fiats 



represents a collection of all projective subspaces of P(m, GF(q)) of a particular di- 
mension. For example, the first level of flats above infimum are fiats of dimension 0, 
the next level are fiats of dimension 1, and so on. Some levels have special names. 
The fiats of dimension are called points, fiats of dimension 1 are called lines, fiats 
of dimension 2 are called planes, and flats of dimension (m-1) in an overall projective 
space P(m, GF(q)) are called hyperplanes. 

2.3 Relationship between Projective Subspaces 

Throughout the remaining paper, we will be trying to relate projective subspaces of 
various types. We define the following terms for relating projective subspaces. 

Contained in If a projective subspace X is said to be contained in another projective 
subspace Y, then the vector subspace corresponding to X is a vector subspace 
itself, of the vector subspace corresponding to Y. This means, the vectors con- 
tained in subspace of X are also contained in subspace of Y. In terms of projective 
spaces, the points that are part of X, are also part of Y. The inverse relationship 
is termed 'contains', e.g. "Y contains X". 

Reachable from If a projective subspace X is said to be reachable from another 
projective subspace Y, then there exists a chain(path) in the corresponding lattice 
diagram of the projective space, such that both the flats, X and Y lie on that 
particular chain. There is no directional sense in this relationship. 

2.4 Union of Projective Subspaces 

Projective Spaces are point lattices. Hence the union of two projective subspaces is 
defined not only as set-theoretic union of all points(l-dimensional vector subspaces) 
which are part of individual projective subspaces, but also all the linear combinations of 
vectors in all such 1-dimensional vector subspaces. This is to ensure the closure of the 
newly-formed, higher- dimensional projective subspace. In the lattice representation, 
the flat corresponding to union is reachable from few more points, than those contained 
in the flats whose union is taken. 



3 A Model for Computations Involved 

3.1 The Computation Graph 

The 0-dimensional subspaces of a d-dimensional projective space (P((i, F)) projective 
space are called the points, and the (d — l)-dimensional subspaces are called the hyper- 
planes. Let V be the (d + 1) -dimensional vector space corresponding to the projective 



space F{d,¥). Then, as stated in the previous section, points will correspond to 1- 
dimensional vector subspaces of V, and hyperplanes will correspond to d-dimensional 
vector subspaces of V. A bipartite graph is constructed from the point-hyperplane 
incidence relations as follows. 

• Each point is mapped to a unique vertex in the graph. Each hyperplane is also 
mapped to a unique vertex of the graph. 

• An edge exists between two vertices iff one of those vertices represents a point, the 
other represents a hyperplane and the 1-dimensional vector space corresponding 
to the point is contained in the d-dimensional vector space corresponding to the 
hyperplane. 

From the above construction, it is clear that the graph obtained will be bipartite; the 
vertices corresponding to points will form one partition and the vertices corresponding 
to the hyperplanes will form the other. Edges only exist between the two partitions. 
A point and hyperplane are said to be incident on each other if there exists an edge 
in between the corresponding vertices. 

Points and hyperplanes form dual projective subspaces; the number of points contained 
in a particular hyperplane is given by (f){d — 1,0, s) and the number of hyperplanes 
containing a particular point is given by (j){d — — 1, d—l — — l,s) = (j){d— l,d — 2,s). 
After substitution into equation (j3]), it can easily be verified that (j){d — 1,0, s) = 
(f){d — 1, d — 2, s). Thus, the graph constructed is a regular balanced bipartite graph , 
with each vertex having a degree of (j){d — 1, 0, s). 

In fact, many possible pairs of dual projective subspaces could be chosen to construct 
the graph, on which our folding scheme can be applied. Points and hyperplanes are 
the preferred choice because usually the applications require the graph to have a high 
degree. Choosing points and hyperplanes gives the maximum possible degree for a 
given dimension of projective space. 

3.2 Description of Computations 

The computations that can be covered using this design scheme are mostly applicable 
to the popular class of iterative decoding algorithms for error correcting codes, like 
LDPC or expander codes. A representation of such computation is generally available 
in the model described above, though it may go by some other domain- specific name 
such as Tanner Graph. The edges of such representative graph are considered as 
variables/datum of the system. A vertex of the graph represents computation of a 
constraint that needs to be satisfied by the variables corresponding to the edges(data) 
incident on the vertex. A edge-vertex incidence graph (EV-graph) is derived from 
the above graph. The EV-graph is bipartite, with one set of vertices representing 
variables and the other set of vertices representing the constraints. The decoding 



algorithm involves evaluation of all the constraints in parallel, and an update of the 
variables based on the evaluation. The vertices corresponding to constraints represent 
computations that are to be assigned to, and scheduled on certain processing units. 

4 The Concept of Folding 

Semi-parallel, or folded architectures are hardware-sharing architectures, in which 
hardware components are shared/overlaid for performing different parts of computation 
within a (single) computation. In its basic form, folding is a technique in which more 
than one algorithmic operations of the same type are mapped to the same hardware 
operator. This is achieved by time-multiplexing these multiple algorithm operations 
of the same type, onto single functional unit at system runtime. 

The balanced bipartite PG graphs of various target applications perform parallel com- 
putation, as described in section 13.21 In its classical sense, a folded architecture rep- 
resents a partition, or a fold, of such a (balanced) bipartite graph(see figure |3]). The 
blocks of the partition, or folds can themselves be balanced or unbalanced; unbalanced 
folding entails no obvious advantage. The computational folding can be implemented 
after (balanced) graph partitioning in two ways. In the first way, that we cover in this 
paper, the within-fold computation is done sequentially, and across-fold computation 
is done parallely. This implies that many such sequentially operating folds are sched- 
uled parallely. Such a more-popular scheme is generally called a supernode-based folded 
design, since a logical supernode is held responsible for operating over a fold. Dually, 
the across-fold computation can be made sequential by scheduling first node of first 
fold, first node of second fold, . . . sequentially on a single module. The within-fold com- 
putations, held by various nodes in the fold, can hence be made parallel by scheduling 
them over different hardware modules. Either way, such a folding is represented by a 
time-schedule, called the folding schedule. The schedule tells that in each machine 
cycle, which all computations are parallely scheduled on various functional units, and 
also the sequence of clusters of such parallel computations across machine cycles. 




Figure 3: (Unevenly) Partitioned Bipartite DFG 



4.1 Lattice Embedding 

Projective Space lattices being modular lattices, it is also possible to exploit sym- 
metry of (lattice) property reflection from a mid-way embedded level of fiats from 
any two dual levels of flats which form a balanced bipartite graph based on their 
inter-reachability. For point-hyperplane bipartite graphs, this specialized scheme of 
folding is what we discuss here as one of the schemes. The other scheme involves usage 
of two dual mid-way embedded levels. Both these schemes are an example of the first 
type of folding: sequential within, and parallel across folds. We term such schemes as 
lattice embedding schemes, since the actual functional units(supernodes) are embed- 
ded at proper places(mid-way fiats) in the corresponding PG lattice. An illustration 
of such folding is provided in figure IH 
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Figure 4: Folding PG Graph via Lattice Embedding 



5 A Folding Scheme for p(5,gf(2)) 

In this section, we provide two schemes to demonstrate the possibilities of folding 
computations related to point hyperplane incidence graphs derived from 5-dimensional 
PG over GF(2), P(5,GF(2)). The schemes are summarized by following two proposi- 
tions. 

Proposition 1. When considering computations based on the point-hyperplane in- 
cidence graph o/ P(5, GF(2)), it is possible to fold the computations and arrange a 
scheduling, that can be executed using 9 processing units and 9 dual port memories. 

Proposition 2. When considering computations based on the point-hyperplane in- 
cidence graph o/ P(5, GF(2)), it is possible to fold the computations and arrange a 
scheduling, that can be executed using 21 processing units and 21 dual port memories. 
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5.1 Proof of Propositions 

5.1.1 Some Cardinalities of P(5,GF(2)) Lattice 

We first present some important combinatorial figures associated with P(5, GF(2)). We 
will use these numbers in proving the functional correctness of the folded computations. 
For definition of 0(-), refer equation [3l 

• No. of points = No. of hyperplanes (4-dimensional projective subspace) = 
(/)(5,0,2) = 63. 

• No. of points contained in a particular hyperplane = 0(4, 0, 2) = 31 

• No. of points contained in a line (1- dimensional projective subspace) = 0(1, 0, 2) = 
3 

• No. of points contained in a plane(2-dimensional projective subspace) = 0(2, 0, 2) = 
7 

• No. of points contained in a 3-dimensional projective subspace = 0(3, 0, 2) = 15 

• No. of hyperplanes containing a particular plane = 0(5 — 2 — 1,4 — 2— 1,2) = 7 

• No. of hyperplanes containing a particular line = 0(5 — 1 — 1, 4 — 1 — 1, 2) = 15 

• No. of hyperplanes containing a particular 3-dimensional projective subspace = 
0(5-3-1,4-3-1,2) = 3 

• No. of lines contained in a 3-dimensional projective subspace = 0(3, 1, 2) = 35 

5.1.2 Lemmas for Proving Proposition [1] 

We prove the following lemmas, required to establish the feasibility of folding and 
scheduling using proposition [H 

Lemma 1. The point set of a projective space of dimension 5 over GF(2) (represented 
by the non-zero elements of a vector space V over G¥{2)) can be partitioned into disjoint 
subsets such that each subset contains all the non-zero elements of a 3-dimensional 
vector subspace of V. Thus, each such block of partition/subset represents a unique 
plane (2- dimensional projective subspace). 

Proof. The vector space V is represented by the field GF(2^) and has an order of 63. 
Since 3 is a divisor of 6, GF(2^) is a subfield of GF(2^). The multiplicative cyclic 
group of GF(2'^) (of order 7) is isomorphic to a subgroup of the multiplicative cyclic 
group of GF(2^). Hence, we can perform a coset decomposition to generate 9 disjoint 
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partitions of V into subsets such that each subset is a 3-dimensional vector space (-{0}), 

representing a 2-diniensional projective space i.e. a plane [2]. 

For details, assume that a is a generator for the multiplicative group of GF(2^). Then, 

a^^) is the 7-element sub-group that we are looking for. The distinct cosets of this sub- 
group provide the partition that we need to generate disjoint projective subspaces. D 

Corollary 2. The above partitioning leads to partitioning of the set of hyperplanes 
(4- dimensional projective subspaces) as well. A (projective) plane can always be found 
that contains the intersection of all projective subspaces of the hyperplanes which belong 
to the same subset of hyperplanes belonging to a block of the partition, but itself is not 
contained in any hyperplane outside the subset. Here, intersection of projective subspace 
implies the intersection of their corresponding point sets. 

Proof. In P(5,GF(2)), each (projective) plane is contained in 7 hyperplanes. These 
hyperplanes are unique to the plane, since they represent the 7 hyperplanes that are 
common to the set of 7 points that form the plane. More explicitly, if two planes do 
not have any point contained in common, they will not be contained in any common 
hyperplane, and vice-versa. Thus, the 9 disjoint planes partition the hyperplane set 
into 9 disjoint subsets. D 

Lemma 3. In projective spaces over G¥{2), any subset of points (hyperplanes) having 
cardinality of 4 or more has 3 non-collinear (independent) points (hyperplanes) . 

Proof. The underlying vector space is constructed over GF(2). Hence, any 2-dimensional 
vector subspace contains the zero vector, and non-zero vectors of the form aa + /3b. 
Here, a and b are linearly independent one-dimensional non-zero vectors, and a and /3 
can be either or 1, but not simultaneously zero: 

a,/3 e GF(2) : (a = /3) 7^ 0. 
Thus, any such 2-dimensional vector subspace contains exactly 3 non-zero vectors. 
Therefore, in any subset of 4 or more points of a projective space over GF(2) (which 
represent one-dimensional non-zero vectors in the corresponding vector space), at least 
one point is not contained in the 2-dimensional vector subspace formed by 2 randomly 
picked points from the subset. Thus in such subset, a further subset of 3 independent 
points (hyperplanes) i.e. 3 non-collinear vectors can always be found. D 

Lemma 4. In P(5, GF(2)), any point that does not lie on a plane Pi, but lies on some 
disjoint plane P2, is contained in exactly 3 hyperplanes reachable from Pi. The 
vice-versa is also true. This lemma is used in section VS.l.^ 

Proof. If a point on plane P2, which is not reachable from plane Pi, is contained in 4 or 
more hyperplanes (out of 7) reachable from plane Pi, then by lemma [3l we can always 
find a subset of 3 independent hyperplanes in this set of 4. In which case, the point 
will also be reachable from linear combination of these 3 independent hyperplanes, and 
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hence to all the 7 hyperplanes which lie on plane 1. This contradicts the assumption 
that the point under consideration is not contained in plane Pi. The role of planes Pi 
and P2 can be interchanged, as well as roles of points and hyperplanes, to prove the 
remaining alternate propositions. 

Hence if the point considered above is contained in exactly 3 hyperplanes reachable 
from Pi, then these 3 hyperplanes cannot be independent, following the same argu- 
ment as above. If the 3 hyperplanes are not independent of one another, then it is 
indeed possible for such a point to be contained in 3 hyperplanes as follows. Let there 
be 2 disjoint planes Pi and P2 in P(5, GF(2)), whose set of independent points are 
represented by (a, b, c) and (d,e, f). Then, a point(e.g. d) on P2 is reachable from 
exactly 3 hyperplanes (a, b, c, d, e), (a, b, c, d, f ) and (a, b, c, d, (e + f)), which lie on 

Pi. 

D 

From the above three lemmas, it is easy to deduce that a point reachable from a plane 
Pi is further reachable from 7 hyperplanes through Pi , and 3 hyperplanes from each of 
the remaining 8 disjoint planes. The 9 disjoint planes can be found via construction in 
lemma [TJ Thus, in a point-hyperplane graph made from P(5,GF(2)), the total degree 
of each point, 31, can be partitioned into 8*3 + 7 = 31 by using embedded disjoint 
projective planes, a result of paramount importance in our scheme. This is true for 
all points with respect to the planes that they are reachable from, and all hyperplanes 
with respect to the plane they contain. This symmetry is used to derive a conflict free 
memory access schedule and its corresponding data distribution scheme. 

5.1.3 Proof of Proposition [1] 

We prove now, the existence of folding mentioned in proposition [1] constructively , by 
providing an algorithm below for folding the graph, as well as scheduling computations 
over such folded graph. 

We have shown above in lemma [T] that we can partition the set of points into 9 disjoint 
subsets (each corresponding to a plane). The algorithm for partitioning is based on 
[2]. Also, there is a corresponding partitioning of the hyperplane set. Let the planes 
assigned to the partitions be Pi, P2 • • • , -Pg-We will assign a processing unit (system 
resource) to each of the planes. Let us abuse notation and call the processing units by 
the name corresponding to the plane assigned to them. We add a commercially- off^-the- 
shelf available dual port memory Mi to each processing unit Pj, to store computational 
data. We then provide a proven schedule that avoids memory conflicts. The 
distribution of data among the memories follows from the schedule. Both these issues 
are addressed below. 

For the computations we are considering, the processing units perform computations on 
behalf of the points as well as the hyperplanes. The data symbols are represented by the 
edges of the bipartite graph. The overall computation is broken into two phases . Phase 
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1 corresponds to the point vertices performing the computations, using the edges, 
and updating the necessary data symbols. Phase 2 corresponds to the hyperplanes 
performing the computations, and updating the necessary data symbols. 
The data distribution among the memories local to the 9 processing units (Mq, Mi . . . , Mg), 
and subsequent scheduling, is done as follows. 

• In Phase 1, processing unit Pi performs the computations corresponding to the 
points that are contained in the plane Pi, in a sequential fashion. For each of the 
7 points contained in the plane Pi, there will be 7 units of data corresponding 
to 7 hyperplanes containing plane Pj. Thus, 49 units of data corresponding to 
them will be stored in Mj. In addition to this, Mj will further store 3 units of 
data for each of the 56 remaining points not contained in Pi. These 3 units of 
data correspond to the incidence of each of the points not contained in Pi with 
some 3 hyperplanes containing the plane P,; see lemma HI 

Lemma 5. The distribution of data as described above leads to a conflict-free 
memory-access pattern, when all the processing units are made to compute in 
parallel same computation but on different data. 

Proof. Suppose processing unit Pi is beginning the computation cycle corre- 
sponding to some point a. It needs to fetch data from the memories, perform 
some computation and write back the output of the computation. First, it col- 
lects the 7 units of data corresponding to a in Mi. This corresponds to the edges 
that exist between point a and the hyperplanes that contain plane Pi. Next, it 
fetches 3 units of data from each of the remaining 8 memories. This consists of 
the edges between a and the 3 hyperplanes from each of the planes not containing 
a. Thus, 7 -|- 3 * 8 = 31 units of data will be fetched for a. We have all the 9 
processing units working in parallel, and each of them follows the same schedule. 

For processing unit Pi, first, 7 units of data from M, are fetched locally. Further, 
3 units of data are fetched from each M^i+j^modg, j going from 1 to 8. Thus, 
during the time when Pq is accessing Mi, Pi will be accessing M2, and so on till 
we reach Pg which will be accessing Mq. In this fashion, no two processing units 
will be trying to access the same memory at the same time i.e. no memory access 
conflicts will occur. D 

The writing of the output is done with the same schedule. If dual port memories 
are used, we can overlap the writing of the output of one point using one port, 
with the reading of the input of the next point using the other port. 

• In Phase 2, processing unit P, performs the computation corresponding to the 
hyperplanes that contain plane Pj. If the data is distributed as explained in the 
previous point, then Mj already contains all the data required for the hyperplanes 
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containing plane P,. In this case, the processing unit communicates only with its 
own memory and performs the computation. 

For above data distribution, the address generator circuit in Phase 1 is just a counter, 
while in Phase 2 it becomes a look up table. The address generation circuits are 
incorporated within the processing unit itself. As can be observed from the above 
discussion, while scheduling different computations on the same physical processing 
unit, data does not need any internal or external shuffling across memories associated 
with other processing units. This, along with complete conflict freedom in memory 
accesses, saves the entire signiflcant overhead of general folding schemes, which in- 
cludes shuffling of data in between scheduling of two folds. Thus one achieves the best 
theoretically possible throughput in such designs. 

5.1.4 Lemmas for Proving Proposition [2] 

We now move on to proving Proposition [21 Proposition [2] represents moving away from 
chosing the exact mid-way level of flats in PG lattice, to multiple choices of two dual 
levels of flats, for folding purposes. Thus it is a generalization of Proposition [H 

Lemma 6. The point set of a projective space of dimension 5 over GF(2) (represented 
by the non-zero elements of a vector space V of dimension 6 over GF(2)^ can he par- 
titioned into disjoint subsets such that each subset contains the non-zero elements of 
a 2-dimensional vector subspace of V. Each subset/block of the partition represents a 
unique line (1-dimensional projective subspace). 

Proof. The proof is very similar to lemmadl As before, V is represented by GF(2^) and 
since 2 divides 6, GF(2^) is a subfleld. Thus V can be partitioned into disjoint vector 
subspaces(-{0}) of dimension 2 each (using coset decomposition [2]). Each of these 
vector subspaces represents a 1-dimensional projective subspace (line), and contains 3 
points. D 

Corollary 7. The dual of above partitioning partitions the set of hyperplanes (4- 
dimensional projective subspaces) into disjoint subsets of 3 hyperplanes each. A unique 
3-dimensional projective subspace can always be found that is contained in all the 3 
hyperplanes of one such subset, and none from any other subset. 

Proof. By duality of projective geometry, it follows that we can partition the set of 
hyperplanes into disjoint subsets of 3 each, such that each subset represents a unique 
3-dimensional projective subspace. This can be achieved by performing a suitable coset 
decomposition of the dual vector space. D 

Thus, we can partition the set of 63 points into 21 sets of 3 points each. In this way, 
we have a one-to-one correspondence between a point and the line that contains it. 
We now provide certain lemmas, that will be needed to prove theorem [11] later, which 
establishes the existence of a conflict-free schedule. 
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Lemma 8. The union of two disjoint lines (1- dimensional projective suhspaces) in 
,GF(2)) leads to a 3-dimensional projective subspace. 



Proof. Let the two disjoint lines be Li and L2. Being disjoint, they have no points 

contained in common. 

Each hne, being a 2-dimensional vector space contains exactly two independent points. 

Thus, two disjoint lines will contain 4 independent points. Taking a union, we get all 

possible linear combinations of the 4 independent points which corresponds to a 4- 

dimensional vector space(lets call it T12). The 4 independent points have been taken 

from the points of the 6-dimensional vector space V, used to describe the projective 

space. Thus, the 4-dimensional vector space, made up of all the linear combinations 

of the 4 points, is a vector subspace of V. 

Being a 4-dimensional vector subspace of V, T12 represents a 3-dimensional projective 

subspace. D 

Lemma 9. For P(5), let L = {Lq, Li, ... , L20} be the set of 21 disjoint lines obtained 
after coset decomposition ofY. Let T^ be the 3-dimensional projective space obtained 
after taking the union of the lines Li,Lj, both taken from the set L. Then, any line 
Lk from the set L, is either contained in Tij or does not share any point with Tij. 
Specifically, if it shares one point with Tij, then it shares all its points with Tij. 

Proof. Let a be the generator of the cyclic multiplicative group of GF(2^). Then the 
points of the projective space will be given by {a°, a^, . . . , a^^} and for any integer i, 

lyi -- ^{i mod 63) 

The lines of a projective space are equivalent to 2-dimensional vector subspaces. After 
the relevant coset decomposition of GF(2^), as per lemma |6l without loss of generality, 
we can generate a correspondence between lines of L and the cosets as follows. 



Lo = {a'>,a^\a^^} 
L, = {a\a^',a^^} 
L2 = {a',a^\a^'} 

L,, = {a''>,a^',a^'} 
L2o^{a'',a'\a''} 



Now, Li = {a\a'+^\a'+^^}, where, i + 21 = ((i + 21) mod 63) and i + 42 = ((i + 

42) mod 63). 

Similarly, Lj = {a^ , a^+^\ a^+^^} 
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Now, Tij is given by the union of Li,Lj. Thus, Tij contains all possible linear combi- 
nations of the points of Li and Lj. Let us divide the points of T^j into two parts: 

1. The first part Xi is given by the 6 points contained in Lj and Lj. 

2. The second part X2 contains 9 points obtained by the linear combinations of the 
form aa" + ba^, where a" G Li and a" G Lj and a, b take the non-zero values 
of GF(2), i.e. a = b = 1. 

Consider any line L^ G L. 

• Case 1: 

If k = i or k = j, then by the given construction, it is obvious that Lk C Tij and 
the lemma holds. 

• Case 2: 
Here, k 7^ i, j 

We have, L^ = {a'' , a'^^'^^ , a'^^^'^} . Also, L^ G L and k 7^ i,j implies that L^ is 
disjoint from Lj and Lj. Thus, it has no points contained in common with Lj 
and Lj. 

Since L^ has no points contained in common with Lj and Lj, it cannot have any 
points in common with the set of points Xi of Tij defined above. 

Now, we will prove that if L^ has even a single point in common with the set of 
points X2 defined in point 2 above, then it has all its points in common with the 
set X2 which implies that L^ C Tij. If no points are in common, then L^ is not 
contained Tij as required by the lemma. 

Without loss of generality, let a^ = a" + a^ for some a" G Lj and a"" G Lj. 

From the coset decomposition given above, it is clear that if a^ G L^, then 
a'^+^i g i^^ and a'^+^2 ^ ^^_ Here again, k + 21 = ((k + 21) mod 63) and 
k + 42^ ((k + 42) morf 63). 

Since a is a generator of a multiplicative group, a'^^^^ = a^ .0?^ , and a^^^'^ = 

Also, one of the fundamental properties of finite fields states that the elements 
are abelian with respect to multiplication and addition and the multiplication 
operator distributes over addition, i.e. 

a.(b + c) = (b + c).a = a.b + a.c = b.a + c.a (4) 

where a, b, c are elements of the field. 
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Consider, a*^"*"^^, We have, 



a 



a 



a 



a 



.fe+21 ^ ^«+21^^.+21 



(5) 
(6) 
(7) 
(8) 



Here, ([7]) follows because of 
modulo 63. 



and, as usual, the addition in the indices is taken 



Since, a" G Li and a^ G Lj, from the coset decomposition scheme, we have, 
a"+2i G Li and a'^+^i g L^-. Thus, a"+2i g T^^ and a^'+^i ^ Tij. And finally, 
(Q;""''2i + a*'"''2i) G Tjj which has the straightforward implication that a'^+^i ^ 7t_^._ 

Analogous arguments for a^"'"^^ prove that a^"''"^^ ^ 7"^^.^ Thus, all three points of 
Lk are contained in Tj.j and Lk C Tjj. 

The arguments above show that any line L^ G L either is completely contained 
in Tij or has no intersecting points with it. For the sake of completeness, we 
present the points in T^j so that is easy to "see" the lemma: 



T 



V 






j+21 



.i+21 



j+42 



j'+42 



a-' + a* 
a^' + a^+21 
a^' + a^+42 



a 



i+21 



+ a* 



a 



i+42 



+ a* 



.i+21 
j+21 



+ a 
+ a 



i+21 

j+42 



.i+42 
.i+42 



+ a 
+ a 



i+21 

i+42 



n 

Lemma 10. Given the set L of 21 disjoint lines that cover all the points o/P(5, GF(2)), 
pick any Lj G L and take its union with the remaining 20 lines in L to generate 20 3- 
dimensional projective subspaces. Of these 20, only 5 distinct 3- dimensional projective 
subspaces will exist. 

Proof. Any 3-dimensional projective subspace has 3 hyperplanes containing it, refer 
corollary [71 If a line is contained in a 3-dimensional projective subspace, then it is 
contained in all the 3 hyperplanes, that contain that 3-dimensional projective subspace. 
Also, if a line is not contained in a 3-dimensional projective subspace, it is not contained 
in any of the hyperplanes, that contain it. This is because: 



dim{Li U Ti) = dim{Li) + dim{Ti) - 
dim{Li) = 2 , dim{Ti) = 
dim{Li n Ti) = 
=^ dim{Li U Ti) =6 



dim{Li n Ti] 
4 
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where Li is the hne, and Ti is the 3-diniensional projective subspace not containing 
the hne. A hyperplane is a S-dimensional vector subspace. So, if dim{Li U Ti) > 5, Li 
is not contained in any hyperplane containing Ti. 

Let Tij and Tjk be two 4- dimensional vector subspaces of V that represent two 3- 
dimensional projective subspaces. Then, dim{Tij) = dim{Tjk) = 4. Also, 

dim{Tij U Tjk) = diin(Tij) + dim{Tjk) — dim{Tij fl Tjk) 

It is easy to see that by virtue of base Galois field being GF(2), Tij and Tjk have 0,1 or 
3 common hyperplanes. No other case is possible. If they have 3 common hyperplanes, 
then Tij = Tjk- This implies that dimiTij fl Tjk) = 4. 

If they have one hyperplane in common, the 5-dimensional vector subspace corre- 
sponding to that hyperplane must contain both the 4-dimensional vector subspaces. 
This is possible iff dim{Tij U Tjk) = 5, which in turn, by rank arguments, implies 
dimiTij n Tjk) = 3. 

If they have no hyperplane in common, then dim{Tij U Tjk) = 6 which again, by rank 
arguments, implies dimiTij fl Tjk) = 2. 

Consider the union of line Li with Lj G L,j 7^ i. By Lemma [8], the union generates a 
3-dimensional projective space. Lets call it Tij. Similarly, let the union of line Li with 
Lfc G L, A; 7^ i,j be called Tj^. 
By lemma IHl either Lk C Tij or Lk fl Tij = 0. 
If {Li,Lk) C Tij, then Tik=Tij. 

If Lk n Tij = 0, then Tik is distinct from Tij. Since exactly 3 hyperplanes contain a 
3-dimensional projective subspace Tik, which in turn contains Li, 3 new hyperplanes 
reachable from Li get discovered as and when we get another distinct 3-dimensional 
projective subspace. Moreover, dim,{Tij n Tik) = 2, which implies that Tij and Tik do 
not share any hyperplanes. 

Applying this argument iteratively for the 20 3-dimensional projective subspaces we see 
that a maximum of 5 distinct 3-dimensional projective subspaces can be generated, 
each of which gives a cardinality of 3 hyperplanes to Lj, thus making 15 hyperplanes. 
Each 3-dimensional projective subspace, e.g. Tij contains 15 points and hence, it can 
contain a maximum of 5 disjoint lines. One of them is Lj, and another 4 need to 
be accounted for. So, when the union of Lj is taken with the remaining 20 lines, a 
maximum of 4 lines, out of these 20 lines, can give rise to same 3-dimensional projective 
subspace, Tij. This implies that a minimum of 20/4 = 5 3-dimensional projective 
subspaces can be generated from the remaining 20 lines. 

Since a maximum and minimum of 5 3-dimensional projective subspaces can be gen- 
erated, exactly 5 distinct 3-dimensional projective subspaces are generated. Moreover, 
none of these subspaces share any hyperplanes. D 
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5.1.5 Proof of Proposition 2 

The main theorem behind the construction of schedule mentioned in proposition 2, is 
as following. 

Theorem 11. In P(5, GF(2)), given a set of 21 disjoint lines (1-dimensional projec- 
tive subspaces) that cover all the points, a set of 21 disjoint 3-dimensional projective 
subspaces can be created such that they cover all the hyperplanes. Here, hyperplane 
covering implies that each hyperplane o/P(5,GF(2)) contains, or is reachable in the 
lattice to, at least one of the 21 disjoint 3-dimensional projective subspaces as its sub- 
space. In this case, each point attains its cardinality of 31 reachable hyperplanes in 
P(5,GF(2)), in the following manner: 

1. It is reachable from 3 hyperplanes each, via 5 3-dimensional projective subspaces 
that contain the line corresponding to it, as per the partition m[3 

2. It is reachable from 1 hyperplane each, via the remaining 16 3-d projective sub- 
spaces that necessarily cannot contain the line corresponding to it. 

The dual argument with the roles of points and hyperplanes interchanged also holds. 

Proof. Generate the set of 21 disjoint lines L according to the coset decomposition 
corresponding to the subgroup isomorphic to GF(2^). We choose this subgroup as the 
canonical subgroup mentioned in Lemma [9], i.e. {q;°, a^^, a^^}. 

Choose any line Li from this set and take its union with the remaining 20 lines in the set 
to generate 5 distinct 3-dimensional projective subspaces(as proved in lemma [TO!). Call 
these projective subspaces Ti, T2, . . . , T5. Choose 5 distinct lines, each NOT equal to 
Lj, to represent each of these projective subspaces. Such a choice exists by lemma [H 
Pick the line representing Ti, and take its union with the other 4 lines to generate 4 
new 3-dimensional projective subspaces. Choose 4 more lines(distinct from the 5 lines 
used earher), to represent these 4 new projective subspaces. Again, such distinct lines 
exist by lemma [H and there are overall 21 distinct lines. Pick another line from Ti 
(not equal to the previously used lines), and take its union with the 4 newly chosen 
lines to form yet more 4 new 3-dimensional projective subspaces. Repeat this process 
2 more times, till one gets 21 different 3-d projective subspaces, each contributing 3 
distinct hyperplanes to Lj. 

The following facts hold for the partitions of hyperplanes and points implied by the 
above generation process. 

1. Each line in the set L is contained in 5 3-d projective subspaces, and each 3- 
dimensional projective subspace contains 5 lines from the set L. 

2. A point in a line is reachable from 3 hyperplanes via every 3-d projective sub- 
space that contains the line, and 1 hyperplane via every projective subspace that 
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doesn't contain the line. In the latter case, if the 3-diniensional projective sub- 
space doesn't contain the line, it doesn't contain the point. Hence, the projective 
subspace will only contribute one hyperplane corresponding to the union of the 
point with the 3-d projective subspace. 

From the above facts, all the points of the theorem follow. The dual argument holds 
in exactly the same way. One could have started with a partition of 3-dimensional 
projective subspaces and generated lines by completely working in the dual vector 
space and using the exact same arguments. Hence, there are two ways of folding for 
each partition of V into disjoint lines. D 

To prove Proposition 2, we use lemmas |6l [8], [9l [10] and the above main theorem. For a 
system based on point-hyperplane graph of P(5, GF(2)), the graph can be folded easily, 
from theorem [TTJ A scheduling similar to the one used for Proposition [T] can then be 
developed as following. 

We begin by assigning one processing unit to every line of the disjoint set of 21 lines. 
Each processing unit has an associated local memory. After the 3-dimensional projec- 
tive subspaces have been created as explained above, we can assign a 3-d projective 
space to each of the memories. The computation is again divided into two phases. In 
phase 1, the points on a particular line are scheduled on the processing units corre- 
sponding to that line in a sequential manner. A point gets 3 data units from a memory 
if the 3-dimensional projective space corresponding to that memory contains the line, 
otherwise it gets 1. In phase 2, the memory already has data corresponding to the 
hyperplanes that contain the 3-dimensional projective subspace representing the mem- 
ory and the communication is just between the processing unit and its own memory. 
The output write-back cycles follow the schedule of input reads in both phases. It 
is straightforward to prove, on lines of Lemma |5l that the above distribution of data 
again leads to a conflict-free memory-access pattern, when all the processing units are 
made to compute in parallel same computation but on different data. 

6 Generalization of Folding Scheme to Arbitrary 
Projective Geometries 

In previous section, we gave complete construction of graph folding, and corresponding 
scheduling for example of P(5, GF(2)). In this section, we generalize above propositions 
for projective geometries of arbitrary dimension m, and arbitrary non-binary Galois 
Field GF(q). The generalization is carried out for cases where (m -|- 1) is not a prime 
number. By extending the Prime Number Theorem to integer power of some fixed 
number, it is expected that not many cases(values of q) are left out by doing such 
restricted coverage. 
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A P(m, GF(q)) is represented using the elements of a vector space V of dimension 

(m + 1) over GF(q). If (m + 1) is not prime, it can be factored into non-trivial prime 

factors pi, p2, ps, • • • , Pn such that 

Pi X P2 X . . . X pn = (m + 1). The dimensions of these projective subspaces vary 

from 1 to (^^^), and the dimensions of the corresponding vector subspaces of V vary 

from 2 to (^^^)- The points are the 0-dimensional projective subspaces (represented 

by the 1-dimensional vector subspaces of V) and the hyperplanes are the (m — 1) 

dimensional projective subspaces. 

It is convenient to describe the folding scheme in two separate cases. 

6.1 Folding for Geometry with Odd dimension 

Suppose (m + 1) is even. We prove the following lemmas. 

Lemma 12. (Generalization of Lemma \^ In P(m, GF(q)) with odd m, the set of 
points, which has cardinality ( - — — j^ ) , can he partitioned into disjoint subsets. Each 



block of this partition is a vector subspace having dimension (^^^), and contains 
("^+1) \ 
'^ ^_i~^ ) points each. 

Proof. Let the vector space V, corresponding to P(m, GF(q)), be represented by 
GF(g™+i). Since (m + 1) is even, (^) divides (m+1). Hence, GF(g'^) is a 

sub-field of GF(g'"+^). The multiplicative cyclic group of GF(g^2~) is isomorphic 
to a subgroup of the multiplicative cyclic group of GF(g'^+^). Hence, we can again 
perform a coset decomposition to generate disjoint blocks of partition of corresponding 
vector space, V, into subsets. Each such subset is a (^^^^)- dimensional vector subspace 
(-{0})(say, Si), representing a (^^^y^) -dimensional projective space i.e. a plane [2j. By 
property of vector subspaces, if x G Si then A ■ x G Si, where A G GF(q). Since each 
point represents an equivalence class of vectors, except 0, a projective subspace of di- 
mension (^^3-^), corresponding to each partitioned vector subspace, contains exactly 

'' ^_^~ I points. n 

Lemma 13. (Generalization of Gorollary\^ In P(m, GF(q)) with odd m, the set of 
hyperplanes, which has cardinality ( - — 3^ j , can be partitioned into disjoint subsets. 
Each block of this partition is a vector subspace having dimension (^^^); and contains 

'^ ^_^~ J hyperplanes each. 

Proof. Because of duality of points and hyperplanes, there are an equal number of 
hyperplanes containing each {J^^^ — l) -dimensional projective subspace. Further, the 
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set of such subspaces together covers all the hyperplanes of P(m, GF(q)). Here, hyper- 
plane covering implies that each hyperplane of P(m, GF(q)) contains, or is reachable 
in the lattice to, at least one of the chosen disjoint (^^^) -dimensional projective sub- 
spaces as its subspace. Moreover, since we have a disjoint partition of points, there will 
exist a corresponding disjoint partition of hyperplanes. The number of such partitions 

can easily be seen to be I 1rn+i)~^ I • n 

Vg^— -1/ 

The following theorem extends the partitioning portion of Proposition [H as detailed 
in its proof. 

hyperplanes belonging to a unique block of the partition that contains this point, and 

(m+l) 

' hyperplanes from remaining blocks of partition, that do not contain this 



point. 

Proof. By equation [3l each point has a total of ( ^ J^ j hyperplanes containing it. By 

/ (m + l) 

putting lemmas [12] and [13] together, one can see that it is contained in I - — _^^ 
hyperplanes belonging to the partition that contains this point. It is also contained 

/ Qm_„^-^ — - \ II- . . ( o"'+l — 1 \ 1 (m+l) 

111 I — 31 1 hyperplanes belongmg to the remaining I \rn+i) — ) ~ 1 = Q ^ 

partitions in the following manner, using the two lemmas below. 



q 1 -1 



Lemma 15. (Generalization of Lemmd^ In P(m, GF(q)), from any set of 1 _^ 

points (hyperplanes), it is possible to find (^^^ + l) = (^^^) independent points (hyperplanes). 

Proof. As mentioned earlier via lemma [TSj each block of the partition of hyperplane 
set is covered by a vector subspace of dimension (^^^). This representative vector 
subspace, in turn, is formed by (^^^) independent points and all their linear com- 
binations contained in it, using coefficients from GF(q). The total number of points 

contained in a (^^^) dimensional vector space, based on (^^^) independent points, 

/ ("^-1) \ / ('"-I) \ 

is I "^ ^_i~''' ) • Therefore, in any set of I '^ ^_i~''' + 1 I points, it is possible to find 

(^^ + l) = (^^) independent points. 

/ (m-l) 

By duality in PG lattice, it is straightforward to prove that in any set of j - — _^^ 

hyperplanes, it is possible to find (^^^^ + -'-)" (^^^) independent hyperplanes as 
well. O 
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/ (m-1) \ 

mensional projective subspace is reachable from exactly I - — _^~ J hyperplanes through 
that projective subspace, which is contained in I - — _, ~ ) hyperplanes overall. 



Lemma 16. (Generalization of lemma\^ Any point that does not lie in a (^%^) di- 

9-1" 

/ (m-l) \ 

Proof. If that point is reachable from any more hyperplanes than j - — _^~ J , then 

by lemma [T5| we could find (^^^) independent hyperplanes reachable from both this 
point as well as a (^^^) dimensional projective subspace. A (^^^) dimensional projec- 
tive subspace contains exactly (^^^) independent points and hyperplanes. Hence the 
presence of (^^^) independent hyperplanes containing a particular (^^^^) dimensional 
projective subspace implies that all the (^^^^) independent points also contained in the 
same subspace, and their linear combinations, are exactly the points that are reachable 
from such set of hyperplanes. This contradicts the fact that the original point was not 
one of the (^^^) independent points reachable from the (^^^) dimensional projective 
subspace. 

If that point is contained in any less hyperplanes, then the point would not be reachable 
from the established number of hyperplanes in the above partition, as governed by 
its degree. From the equations below, it is easy to see that even if 1 partition not 
containing the considered point contributes even 1 hyperplane less towards degree of 
the considered point, the overall degree of the point in the bipartite graph cannot be 
achieved. 

Hyperplanes from partitions not containing the point = \ * Q " 



2 



q-1 



(m + l) 

IT) -^ ■ 

q — q 2 



q-1 



Total degree = (from partition containing the point) + 

' q-1 j \ q-1 



q-1 



as required 



D 



Putting together these two lemmas, we arrive at the desired conclusion for theorem 

M □ 
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Given the above construction, it is easy to develop a folding architecture and scheduling 
strategy by extending the scheduling for P(5, GF(2)) in section [5.1.3l in a straightforward 
way. One processing unit is once again assigned to each of the disjoint partitions 
of points. Using a memory of size ( '^ _~ j collocated with the processing unit, it is 

again possible to provably generate a conflict-free memory access pattern, by extending 
Lemma O We omit the details of the scheduling strategy, since it is a simple extension 
of the strategies discussed earlier for a 5-dimensional projective space. 

6.2 Folding for Geometry with Even-but-factorizable dimen- 
sion 

If (m + 1) is not even, then let m + 1 = (k + 1) * t, where k > and t > 3 (t = 2 
comes under case 1). Then there exists a projective subspace of dimension k and its 
dual projective subspace will be of dimension (m — k — 1). We will use these projective 
subspaces to partition the points and hyperplanes into disjoint sets and then assign 
these sets to processing units. 

Lemma 17. (Generalization of Lemma l^ In P(m, GF(q)) with m factorizable as 
(m + 1 = (k + 1) * t), the set of points can be partitioned into disjoint subsets. Each 
block of this partition is a vector subspace having dimension k and same cardinality. 

Proof. Let the vector space equivalent of P(m, GF(q)) be V. V has dimension (m + 1), 
and a vector subspace of V which corresponds to projective subspace of dimension k 
has a dimension of (k + 1). Since (k + 1) divides (m + 1), we can partition V into 
disjoint subsets, each set having (k + 1) independent vectors p]. The subsets are 
obtained by coset decomposition of multiplicative group of GF(g™'^^). The blocks 
of this partition are vector subspaces of dimension (k+1), and hence represent a 
k-dimensional projective subspace each. 

Let § denote the collection of these identical-sized subsets (vector subspaces), and let 
the i^^ subset be denoted by Si. An equivalent subset of points of projective space can 
be obtained from each coset Si as the set of equivalence classes using the equivalence 
relation Oj = A ■ Oj, where a,, aj G Si, and A G GF(q). Also, since t > 3, we have, k< 

L^j. n 

Theorem 18. (Generalization of Corollary \^ and Lemma{^ and their combination) 
It is possible to construct a set of dual({ni — k — 1)- dimensional) projective subspaces, 
using the above point sets, such that no two of such subspaces are contained in any 
hyperplane. Further, they together cover all the hyperplanes o/ P(m, GF(q)), thus 
creating a disjoint partition of set of hyperplanes. Here, hyperplane covering implies 
that each hyperplane o/P(m, GF(q)) contains, or is reachable in the lattice from, at 
least one of the chosen disjoint (m — k — 1)- dimensional projective subspaces as its 
subspace. 
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Proof. In a PG lattice, (m — k) independent points are required to create a (m — k — 1)- 
dimensional (dual) projective subspace. Since (^:j^) = ( ^^k+i ^ ) ~ (^ ~ -'-)' ^^^ 
union of any (t — 1) disjoint sets taken from S, each containing (k + 1) independent 
points, and the points that are all possible linear combinations over GF(q) of these, 
will form a (m — k — 1) dimensional projective subspace. The points represent the 
equivalence classes mentioned in lemma [T71 

Without loss of generality, let the first (t — 1) sets, Sq, Si, S2, S3, . . . , St_3, St-2 £ S, 
be combined to make some (m — k — l)-dimensional projective space Ti. 
Here, So, Si, S2, S3, . . . , St_2 are cosets that have been obtained by the coset decom- 
position of the nonzero elements of GF(g™+^). The elements of the i^^ (co)set in § can 
be written as: 

Si = {a\ a'+^, a'+'^, . . . (g'=+^ - 1) terms} 

where, a is the generator of the multiplicative group of GF(g'"'^^), and {0, a*^, a^, a^^, . . . (g'^+^- 
1) terms}, /3=( '^ t+iZi )) forms a subfield of GF(g''""*"^) that is isomorphic to G¥(q''^^). 
Moreover, any Si G S contains only the non-zero elements of a vector space over 
GF(q), and their all possible linear combinations of the form (cocto + Ciai + . . . + Cnttn) 
Vco, Ci, . . . , c„ G GF(q). Here, Oq, cti . • • , a„ G Si, and all c's are not all simultaneously 
0. We will need the following following two lemmas and their corollaries, to complete 
the proof. 

Lemma 19. (Generalization of lemma\9\j Consider Sk G §, /c 7^ 0, 1, 2, 3, . . . , (t — 2). 
// even one point o/Sk is common with T^, then all points ofSy^ must be common and 
thus, Sk C Ti. 

Proof. Divide the set of points of Ti into two parts: 

• Xi consists of the points of So, Si, . . . , S(t_2)- 

• X2 is the set of points of the form coa"° + Cia"^ + . . . + C((_2)a"('"2^ 
where a"' G Si and Cj G GF(q) and there are at least two non-zero q's. 

Moreover, if a*** G Si, then Cja"' G Si, Vq G GF(q). This is becuase q also belongs 
to GF(g'^+^), and hence is some power of a. Therefore, we can abuse notation and 
simply write 

X2 is the set of all points of the form a"" + a"^ + . . . + a"('-2) (9) 

such that there are at least two non-zero terms in the summation. 
Let Sk = {a^, a^"*"^, a'^"'"^^, ■ ■ ■ }. Consider a^ G Sk- It is clear that since Sk is disjoint 
from Si, i G 0, 1, . . . , (t - 2), a'' ^ Xi. 
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Suppose if a'' G X2, then we have, 

a'^+P = a^a^ (10) 

Now, if a* G S'j for some i,j, then a(*+^) G S'j. Therefore, it is clear that equation 
(fT2|) represents some hnear combination of elements of 5*0, ... , St-2 and hence mitst be 
contained in Ti. 

Proceeding in a similar way for all multiples of (3, we find that all points G Sk are 
eventually found part of Ti, where we started just by having one point being part of 
Ti. D 

Corollary 20. For any Ti that has been generated by taking (t — 1) projective sub- 
spaces from E>, the remaining Si 's are either contained in Ti, or have no intersection 
with Ti. 

Lemma 21. Any two {m — 'k — 1)- dimensional projective subspaces constructed as 
above intersect in a vector subspace of dimension (t — 2) * (k + 1). 

Proof. Without loss of generality, consider two specific (m — k — l)-dimensional pro- 
jective subspaces represented by their corresponding vector subspaces, created using 
construction mentioned above. For example, Ti = Sq U Si U S2 U S3 U . . . U St_3 U St_2, 
and Tj = Si U S2 U S3 U . . . U St-3 U St-2 U St-i. Here, we represent both Ti and 
Tj with only the linearly independent Sk that are contained in them. By corollary [201 
the remaining Si are either linearly dependent on these, or do not intersect Ti / Tj at 
all. If we have dimiT; n Tj) = (t - 2) * (k + 1), then 

rfim(Ti U Tj) = dim{Tx) + rfim(Tj) - dim{Ti n Tj) 

= 2m-2k-{k + l)*{t-2) 

= 2m-2k-{k + l)*t + 2{k + l) 

= m + 1 (Since, m+l=(k+l)*t) 



Since V is the overall vector space of dimension (m + 1), and both Ti and Tj are 
represented by subspaces of V, dim{Ti U Tj) < m + 1. Thus, if dim(Ti fl Tj) < 
(t — 2) * (k + 1), we get dim{Ti U Tj) > m + 1, which is a contradiction. 
Also, Property 1 implies that Ti and Tj intersect in a finite number of Si G S. 
This means that they intersect in a dimension which is a multiple of (k + 1). Thus, 
if dim{Ti fl Tj) > (t — 2) * (k + 1), both Ti and Tj become identical (they share 
(t — 1) * (k + 1) independent points). Thus, the only possible value of dimiT^ fl Tj) 
is (t-2)*(k + l). D 
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Corollary 22. Any two (m — k — l)-dimensional projective subspaces constructed as 
above do not have any hyperplanes in common. This is because dim(Ti U Tj) > m, no 
hyperplane (m- dimensional vector subspace ofY) can contain both Ti and Tj. 

To finish off the constructive proof of theorem [18], we just need to generate various 
Ti by collecting different sets of Si G S, and including all possible linear combinations 
over GF(q) of all vectors in this union of Si in Ti. Corollary |22] implies that the 
distinct Ti will not share any hyperplanes. Since § is exhaustive (i.e. it covers all the 
points), going through all possible combinations of Si to generate Ti, will generate a set 
T of (m — k — l)-dimensional projective subspaces, which will exhaustively cover all 
the hyperplanes in P(m, GF(q))(any Ti can be represented by the set of hyperplanes 
that it is contained in). Thus, we will have a partition of hyperplanes which has a 
cardinality equal to that of the set S (duality of points and hyperplanes). D 

6.2.1 Properties of Partitions 

The following facts hold for the partitions obtained above: 

1. No. of hyperplanes per Ti = No. of points per Si = ( '^ _~ j = 0(A;, k — l,q) 

2. No. of hyperplanes per Si = ( _^ ^^ ) = (f){m — k — l,m — 1 — k — l,q) 

3. Each Si is contained in exactly ( - — k+i_i~'^ ) distinct Ti. 

Property 3 is a consequence of the following. If Si fl Ti = 0, then 

dim{Ti U Si) = dim{Ti) + dim{Si) - dim{Ti fl Si) 
= m-k+k+1-0 
= m + 1 

Therefore, if Si fl Ti = 0, then Si is contained in none of the hyperplanes that contain 
Ti. Also, if Si is contained in Ti, it is contained in all the hyperplanes that contain Ti. 
From Corollary [20l it follows that no other case is possible. Thus, every Si is contained 
in exactly f&=±i=5i;±zM)) = f£i;!l;i2^-) T,'.. 



(j>{k,k-l,q) J \ gfe+i-1 

6.2.2 Scheduling 

The above stated facts can be used to generate a construction and schedule analogous 
to the one used for Theorem [TT] as follows. 

For computations described in section 13. 2[ we begin by assigning one processing unit 
to each Si. To each of these processing units, we also assign one local memory. A 
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Ti containing the corresponding Si is also assigned to the processing unit. The com- 
putations associated with points contained in Si are executed on the corresponding 
processing unit in a sequential fashion. Once the computations corresponding to the 
points are finished, the computations associated with the hyperplanes containing Ti 
are executed on the corresponding processing unit in a sequential manner. Each point 
gets data corresponding to (j){k, k — l,q) hyperplanes from every T^ that contains Si, 
when its computation gets scheduled. The remaining Tj's (the ones not containing 
this point (call it A)) have the following property: 

dim{Tj UA) = dim{Tj) + dim{A) - dim{Tj n A) 
= m — k + 1 
= m + 1 — k 



Therefore, the number of hyperplanes reachable from Tj, for point A, will be the num- 
ber of hyperplanes containing the projective subspace (Tj U A). It is a (m + 1 — k) 
vector subspace and therefore is a (m — k) projective subspace. The number of hyper- 
planes containing it is given by: 
(j){m — {m — k) — l,m — 1 — {m — k) — 1, q) = (j){k — l,k — 2,q) 

Lemma 23. (Generalization of Theorem [77]) In the above construction, computation 
on a particular point is able to he reachable from, and get, all the data from all the 
hyperplanes (equal to degree of the point vertex). The dual argument for hyperplane 
computations is also true. 

Proof. Let any point A be contained in a particular Si. In the above construction, we 
have shown that each Si is contained in ( Mk^-i ^ ~ ' ) '^^' ^^^^ of ^^^ Ts have 

(j){k,k — l,q) hyperplanes associated with them. Also, all of these hyperplanes will 
contain the point A. Thus, A is contained in (j){m — k — l,m — l~k — l,q) hyperplanes 
via the Ts that contain the Si in which point A lies. From each of the remaining Ts, 
it gets a degree of (f){k — l,k — 2,q) hyperplanes (shown in the previous paragraph). It 
can be verified, by simple calculations, that 

0(m-fc-l, m-l-k-1, q)+^{k-l, k-2, q)* f ^_^^ _ ^ kW^ — ) = 0("^-l, "^"2, q) 

. Since (j){m — l,m — 2,q) is exactly equal to the number of hyperplanes that contain 

A, we have established the desired result. 

The dual arguments apply to the hyperplanes. D 

The incidence relations can be utilized to generate a data distribution similar to the 
case of 21 processing units for P(5, GF(2)). A corresponding schedule naturally follows. 
For a point, the processing unit 'i' starts by picking up data from its local memory. 
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It then cycles through the remaining memories (j's)and picks up data corresponding 
to the hyperp lanes shared between the point and Tj. The address generation for the 
memories is just a counter if the data is written into the memories in the order that 
they will be accessed. For the computation scheduled on behalf of a hyperplane the 
same access pattern is followed but an address look up is required. 

7 Prototyping Results 

The folding scheme described in this paper was employed to design a decoder for 
DVD-R/CD-ROM purposes |6], [1], while another folding scheme described in p!2] 
was used to design another decoder system described in [TT]. Both the designs are 
patent pending. For the former decoder system, (31, 25, 7) Reed-Solomon codes were 
chosen as subcodes, and (63 point, 63 hyperplane) bipartite graph from P(5,GF(2)) 
was chosen as the expander graph. The overall expander code was thus (1953, 1197, 
761)-code. A folding factor of 9 was used for the above expander graph to do the 
detailed design. 

The design was implemented on a Xilinx virtex 5 LXllOT FPGA [13]. The post place- 
and-route frequency was estimated as 180.83 MHz. The estimated throughput of the 
system at this frequency is ~ 125Mbytes/s. For a 72x CD-ROM read system, the data 
transfer rate is 10.8Mbytes/s. Thus the throughput of system designed by us is much 
higher than what standards require. 

8 Conclusion 

We have presented a detailed strategy to be used for folding computations, that have 
been derived from projective geometry based graphs. The scheme is based on partition- 
ing of projective spaces into disjoint subspaces. The symmetry inherent in projective 
geometry graphs gives rise to conflict-freedom in memory accesses, and also regular 
data distribution. The throughput acheived by such folding schemes is optimal, since 
the schemes do not entail data shuffling overheads (refer section 15.1. 3p . Such schemes 
have also been employed in real systems design. As such, we have found many ap- 
plications of projective geometry based graphs in certain areas, most notably in error 
correction coding and digital system design, that have been reported [1], [12], [3], [T3] . 
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